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Abstract 

RainyDay is a Python-based platform that couples rainfall remote sensing data with Stochastic 
Storm Transposition (SST) for modeling rainfall-driven hazards such as floods and landslides. 
SST effectively lengthens the extreme rainfall record through temporal resampling and spatial 
transposition of observed storms from the surrounding region to create many extreme rainfall 
scenarios. Intensity-Duration-Frequency (IDF) curves are often used for hazard modeling but 
require long records to describe the distribution of rainfall depth and duration and do not provide 
information regarding rainfall space-time structure, limiting their usefulness to small scales. In 
contrast, RainyDay can be used for many hazard applications with 1-2 decades of data, and 
output rainfall scenarios incorporate detailed space-time structure from remote sensing. Thanks 
to global satellite coverage, RainyDay can be used in inaccessible areas and developing countries 
lacking ground measurements, though results are impacted by remote sensing errors. RainyDay 


can be useful for hazard modeling under nonstationary conditions. 
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Software Availability 

Name of Software: RainyDay Rainfall Hazard Modeling System 

Developer: Daniel B. Wright 

Contact: Daniel B. Wright; Address: Room 1269 Engineering Hall, 1415 Engineering Drive, 
Madison, WI 53706, USA; Email: danielb.wright@wisc.edu 


Year first available: 2015 


Required hardware and software: RainyDay requires Python 2.7 or newer (not tested with Python 
3.0 or higher) with Numpy and Scipy. The Netcdf4 and GDAL APIs and Python libraries are 
also required. RainyDay will run on Macintosh, Linux, and Windows machines with the proper 


APIs and Python libraries. 


Cost: Free. RainyDay is currently available by request at 


https://bitbucket.org/danielbwright/rainyday. Open-source release under version 3.0 of the GNU 


General Public License (http:/Awww.gnu.org/licenses/gpl-3.0.en.html) is planned, with 


unrestricted public access to the code repository. 
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1. Introduction 


Rainfall-driven hazards such as floods and landslides are the most common natural disasters 
worldwide, and amongst the most devastating. A growing number of computational hazard 
models are available to transform extreme rainfall inputs into hazard predictions, including 
distributed hydrologic models for the movement of water into and through river systems (e.g., 
Smith et al., 2004); hillslope stability and run-out models for landslide initiation and subsequent 
motion (e.g. Brenning, 2005 and Preisig and Zimmermann, 2010, respectively); and hydraulic 
models for flood wave propagation in channels and floodplains ( e.g., Horritt and Bates, 2002). 
These models have seen significant advances in recent decades, and have become key 
components in probabilistic hazard and risk assessment in fields such as natural catastrophe risk 
insurance, infrastructure design, and land-use planning. The hazard predictions produced by 
these models tend to be highly sensitive to the amount, timing, and spatial distribution of rainfall 
inputs. Unfortunately, progress on developing realistic rainfall inputs for probabilistic hazard and 
risk assessment has been relatively limited. This paper introduces RainyDay, a Python-based 
platform that addresses this shortcoming by coupling rainfall remote sensing data from satellites 
or other sources with a technique for temporal resampling and spatial transposition known as 


Stochastic Storm Transposition (SST) to generate highly realistic probabilistic rainfall scenarios. 


Rainfall inputs for long-term hazard and risk assessment require a probabilistic description of 
three interrelated components: duration, intensity, and space-time structure. Efforts to jointly 
model these components are usually referred to as rainfall frequency analysis, a simple term that 
belies the complexity of the physical phenomena and analytical methods involved. The 
probability structure of the first two components, rainfall duration and intensity, has been a focus 
of research and application for decades (see U.S. Weather Bureau, 1958 and Yarnell, 1935 for 
early examples). These two components are strongly linked and together they determine the 
probability distribution of rainfall volume (or depth) at a point or over an area. The third 
component, space-time structure, describes the spatial and temporal variability of rainfall and is 
determined by storm size, horizontal velocity, and the temporal evolution of spatial rainfall 
coverage. Space-time structure can thus be understood as describing the “when” and “where” of 


extreme rainfall, whereas intensity and duration describe “how much.” 
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Rainfall space-time structure can be an important hazard determinant. For example, a rainstorm 
that is short-lived and small in spatial extent may pose a significant flash flood threat in a narrow 
mountain valley or urban area, but may not represent a hazard on a larger river system. 
Conversely, a month-long rainy period could lead to flooding on a major river due to the gradual 
accumulation of water in soils, river channels, and reservoirs, but may never feature a short-lived 
“burst” of rainfall sufficiently intense to cause flash flooding at smaller scales. Similarly, a storm 
that covers a large area or passes over a series of valleys could lead to more widespread landslide 
or debris flow occurrences than a smaller or stationary storm. Rainfall space-time structure and 
its importance as a hazard trigger, therefore, must be understood within the context of the 
particular geography and scale in question. Due to its complexity, rainfall space-time structure 
has traditionally been less well understood than intensity and duration, and its representation in 


hazard modeling has been less sophisticated. 


The probability distribution of rainfall volume for a given duration is usually derived from rain 
gages and distilled into Intensity-Duration-Frequency (IDF) curves, such as those provided by 
the National Oceanic and Atmospheric Administration’s (NOAA) Atlas 14 (Bonnin et al., 2004). 
Long records (spanning many decades) are generally needed to define the extreme tail of such 
distributions. The challenge of measuring extreme rainfall over long time periods and over large 
areas using rain gages has hindered IDF estimation in many developed countries, while the lack 
of data in poor countries and in inaccessible terrain means that IDF estimation using such 
methods is virtually impossible in many locations. Furthermore, the ability to measure rainfall 
space-time structure at a high level of detail using dense networks of rain gages is nonexistent 
outside of a handful of wealthy cities and research-oriented observation networks. 
“Regionalization,’—the pooling of hazard information over a larger area in order to inform 
analyses at particular locations (see, e.g. Alexander, 1963 for an early discussion of rainfall 
regionalization and Stedinger et al., 1993 for a review)—has helped with IDF estimation using 
short records in areas where rain gage densities are moderate or high. These techniques offer 
little help, however, in parts of the world where rain gages are few or nonexistent, and do not 
offer a framework for incorporating rainfall space-time properties into hazard estimation. Even 


where long rainfall records do exist, nonstationarity due to climate change may mean that earlier 
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portions of the record are no longer representative of current or future IDF properties (e.g. Cheng 


and AghaKouchak, 2014). 


Several techniques, which generally fall under the term of design storm methods, are used in 
long-term hazard estimation to link IDF properties to space-time structure for probabilistic flood 
hazard assessment (commonly referred to as flood frequency analysis). One central design storm 
concept is to link rainfall duration to rainfall intensity via a measure of flood response time, such 
as the time of concentration (e.g. McCuen, 1998). Another attempts to estimate area-averaged 
rainfall from point-scale rainfall estimates using area reduction factors (ARFs; U.S. Weather 
Bureau, 1958). Yet another uses dimensionless time distributions such as the family of U.S. Soil 
Conservation Service 24-hour rainfall distributions (e.g. McCuen, 1998). Each of these methods 
is highly empirical, laden with assumptions (see Wright et al., 2014a; Wright et al., 2014b; 
Wright et al., 2013), valid only in certain contexts, and often misunderstood or misused (K. 


Potter, personal communication, May 6, 2015). 


SST explicitly links IDF properties rainfall space-time properties, providing certain advantages 
over design storm methods. Similar to other regionalization techniques, SST aims to effectively 
“lengthen” the period of record by using nearby observations, albeit using a fundamentally 
different approach involving temporal resampling and spatial transposition of rainstorms drawn 
from a catalog of observed rainfall events from the surrounding region. The inclusion of nearby 
storms at least partially addresses the difficulty of accurately estimating rainfall hazards using 
short records. SST can be used to estimate rainfall IDF properties and also to facilitate modeling 
of interactions of rainfall space-time structure with geographic features (such as hillslopes and 
river networks) at the appropriate spatial and temporal scales. It accomplishes this by generating 
large numbers of extreme rainfall “scenarios,” each of which has realistic rainfall structure based 


directly on observations. 


Alexander (1963), Foufoula-Georgiou (1989), and Fontaine and Potter (1989) describe the 
general SST framework, while Wilson and Foufoula-Georgiou (1990) use the method for rainfall 
frequency analysis and Gupta (1972) and Franchini et al. (1996) use it for flood frequency 


analysis. In those days, however, the method was of limited practical use due to the lack of 
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detailed rainfall datasets with large areal coverage. Those studies also did not focus on the 


aspects of SST related to rainfall space-time structure nor its implications for hazard modeling. 


The recent advent of satellite-based remote sensing provides a relatively low-cost means of 
measuring extreme rainfall over large parts of the globe at moderately high spatial and temporal 
resolution (30 minutes to 3 hours, 4 km to 25 km), while ground-based weather radar offers 
higher-resolution estimates (5-60 minutes, typically 1 to 4 km) over smaller regions. While the 
accuracy of rainfall remote sensing can be poor (particularly in cases of satellite-based estimates, 
e.g. Mehran and AghaKouchak, 2014; and mountainous regions, e.g. Nikolopoulos et al., 2013, 
Stampoulis et al., 2013), such data nonetheless offer unprecedented depictions of rainfall over 
large areas. This creates a variety of opportunities for hazards research and practice at various 


scales, ranging from forecasting and post-event analysis to long-term hazard assessment. 


In the context of SST, the ongoing accumulation of remote sensing data to lengths of 10-20 years 
or more “unlocks” many of the as-yet unrealized opportunities offered by SST. Wright et al. 
(2013) demonstrated the coupling of SST with a 10-year high resolution radar rainfall dataset for 
IDF estimation, and the method was extended to flood frequency analysis for a small urban 
watershed using a distributed hydrologic model in Wright et al. (2014b). These two papers, along 
with Wright et al. (2014a) show that commonly-used design storm practices (ARFs, 
dimensionless time distributions) have serious shortcomings in representing the multi-scale 
space-time structure of extreme rainfall and critical interactions with of this structure with 
watershed and river network features. Wright et al. (2014b) also show that when SST is coupled 
with rainfall remote sensing data and a distributed hydrologic model, it can reproduce the role 
that this structure plays in determining multi-scale flood response. The RainyDay software 
described in this paper was developed to facilitate the use of SST in conjunction with rainfall 


remote sensing data. 


Though SST was developed in the context of flood hazard estimation, it may prove useful for 
rainfall-triggered landslides and other mass movements, subject to the limited accuracy of 
remote sensing data in steep terrain and other limitations that will be discussed subsequently. 


Rainfall space-time structure governs the temporal distribution of rainfall volume onto individual 
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hillslopes, as well as the number of hillslopes subject to rainfall. In addition, steep landslide- 
prone terrain often has poorer rain gage coverage than lowland areas due to limited accessibility, 
suggesting that remote sensing rainfall estimates are potentially useful in such regions, 


particularly if improvements in accuracy can be realized (e.g. Shige et al., 2013). 


Section 2 provides a description of the SST methodology. Section 3 discusses the specific 
implementation of SST in RainyDay, and some of the software’s important features. Section 4 
provides sample results from RainyDay and sensitivity analyses using different input rainfall 
datasets for rainfall and flood frequency analysis in order to illustrate its capabilities and some of 
its limitations, including for flood frequency analysis in nonstationary conditions. Section 5 


includes discussion and concluding remarks. 
Ze The SST Methodology 


In this section, we provide a step-by-step methodology for SST-based rainfall frequency analysis 
for a user-defined geographic “area of interest,” A of arbitrary shape. Higher-level description of 
software features is left to Section 3, but it merits mention that in RainyDay, A can be a single 
remote sensing pixel, a rectangular area containing multiple pixels, or a contiguous area defined 


by a user-supplied polygon shapefile. 


The following five steps describe the SST methodology, as implemented in RainyDay: 

1. Identify a geographic transposition domain A’ that encompasses the area of interest A. 
One could confine A’ to regions with homogeneous extreme rainfall properties, (e.g. flat 
areas far from large water bodies and topographic features). However, such homogeneity 
would likely be difficult to rigorously determine in practice and regardless, such strict 
interpretation is likely to be overly limiting. RainyDay offers several diagnostic aids, 
discussed in Section 3.3, that help the user to understand rainfall heterogeneity over the 
region A’ and to improve the performance of the SST procedure in cases where rainfall 
heterogeneities do exist. Additional issues related to the selection of A’ are explored in 


Section 4.3. 
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2. Identify the largest m temporally non-overlapping storms in A’ from an n-year rainfall 


remote sensing dataset, in terms of rainfall accumulation of duration ¢ and with the same 
size, shape, and orientation of A. For example, the principal axis of the Turkey River 
watershed in northeastern Iowa in the central United States is oriented roughly northwest- 
southeast and has an area of 4400 km’. In this case, the m storms are those associated 
with the m highest ¢-hour rainfall accumulations over an area of 4400 km? with the same 
size, shape, and orientation as the Turkey River watershed. We refer to this set of storms 
henceforth as a “storm catalog,” with the same geographic extent as A’ and the same 
spatial and temporal resolution as the input rainfall data. We refer to the m storms in the 
storm catalog henceforth as “parent storms.” In RainyDay, the user can specify whether 
to exclude certain months (such as wintertime) from the storm catalog. Previous studies 
have shown that there can be low bias introduced in high-exceedance probability (i.e. 
frequent, low-intensity) events if m is small (e.g. Foufoula-Georgiou, 1989; Franchini et 
al., 1996; Wilson and Foufoula-Georgiou, 1990; see Wright et al., 2013 for a discussion). 
The sensitivity of SST results to the choice of m and A’ is explored in detail in Section 
4.3, but m ~ 10n generally minimizes the low bias for frequent events, and would likely 
be a good starting point for new analyses. Low exceedance probability (i.e. rare) events 


are less sensitive to the choice of m (see Section 4.3). 


In RainyDay, duration fis a user-defined input, and as long as ¢ is neither very short nor 
very long relative to the time scales of hazard response in A, subsequent hazard modeling 
results will be relatively insensitive to the chosen value. In this respect, the duration ¢ in 
SST differs conceptually from design storm methods, in which hazard response is 
intrinsically sensitive to the user-specified duration, and this feature is indeed one of the 
chief advantages of SST over design storm methods for multi-scale flood hazard 
estimation (see Wright et al., 2014b for analysis and discussion). In the case of SST- 
based flood frequency analysis, ¢ should be at least as long as the time of concentration 


and preferably somewhat longer. 


. Randomly generate an integer k, which represents a “number of storms per year.” In 


previous SST literature, the assumption was made that k follows a Poisson distribution 
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with a rate parameter 1 storms per year. The m parent storms are selected such that an 
average of A = m/n storms per year are included in the storm catalog. For example, if m= 
100 storms selected from a ten-year remote sensing dataset, then A = 100/10 = 10.0 
storms per year. RainyDay will generate & using either the Poisson distribution or an 
empirical distribution, discussed in Section 3.3. If the Poisson distribution is selected, 
RainyDay will automatically calculate 1 based on user-specified m and the length of the 


input dataset. 


. Randomly select 4 parent storms from the storm catalog. For each selected parent storm, 


transpose all rainfall fields associated with that storm by an east-west distance Ax and a 
north-south distance Ay, where Ax and Ay are drawn from the distributions D(x) and 
Dy(yv) which are bounded by the east-west and north-south extents of A’, respectively. 
The motion and structure of the parent storm is unaltered during transposition and only 
the location is changed. The distributions D(x) and Dy(y) were taken to be uniform in 
Wright et al. (2013) and Wright et al. (2014b), but RainyDay offers additional options, 
described in Section 3.3. We illustrate this step schematically in Figure 1. For each of the 


k transposed storms, compute the resulting ¢-hour rainfall accumulation averaged over A. 


Step 4 can be understood as temporal resampling and spatial transposition of observed 
storm events within a probabilistic framework to synthesize one year of heavy rainfall 
events over A’ and, by extension, over A. RainyDay and previous SST efforts retain the 
largest (in terms of rainfall intensity) of the & events for subsequent steps and discard the 
k-I remaining events, though in principle these events could be retained. The single 
retained storm can be understood as a “synthetic” annual rainfall maximum, analogous to 
those annual rainfall maxima that are extracted from rain gage records for rainfall 
frequency analysis. It should be noted that these rainfall events do not form a continuous 
series, meaning that neither inter-storm periods nor the sequencing of the & storms are 


considered. 


. Repeat steps 3 and 4 a user-specified 7,,,,, number of times, in order to create Tina, years 


of ¢-hour synthetic annual rainfall maxima for A. RainyDay then assigns each annual 
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maximum a rank i according to its rainfall intensity relative to all others. Each of these 
ranked maxima can then be assigned an annual exceedance probability p.’ where p.’ = 
i/T max. Exceedance probability p. is the probability in a given year that an event of equal 
or greater intensity will occur. The “return period” or “recurrence interval” 7;, commonly 
used in hazard analysis, is simply 7; = 1/p¢', so if Tnax= 10°, it is possible to directly infer 
exceedance probabilities of 1.0>p.>10° (recurrence intervals of 1<7;<10°). Each of these 
rainfall events can then serve as one datum of an empirical IDF estimate or as a rainfall 


scenario for hazard modeling. 
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East-West Distance 


Figure 1: Depiction of SST procedure for a single storm consisting of four time intervals ¢;...t, The blue 
ellipses illustrate the time evolution of an arbitrary rainfall isohyet derived from remote sensing observations, 
while the green ellipses show the time evolution of this same isohyet after transposition. Adapted from Wright 


et al. (2013). 


a RainyDay Software 


SL Overview of Software 
We wrote RainyDay to render SST more accessible and to streamline the code for speed and 
ease-of-use using Python. The majority of subroutines utilize the Scipy (Jones et al., 2011) and 


Numpy packages (Walt et al., 2011). To enhance speed, certain RainyDay subroutines call C 
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code through Scipy’s “weave” functionality 


(http://docs.scipy.org/doc/scipy/reference/tutorial/weave.html). Figure 2 shows a schematic of 


workflow in RainyDay. 


While the ranking of rainfall events described in Step 5 of the SST methodology in Section 2 is 
based on rainfall intensity averaged over A, RainyDay will create NetCDF4 files 


(http://www.unidata.ucar.edu/software/netcdf) that contain the transposed rainfall scenarios with 


full depictions of rainfall space-time structure at the native spatial and temporal resolution of the 
input. This is an important feature because space-time structure, and not just average rainfall 
intensity over area A and duration ¢, is important in determining hazard response. For example, 
one rainfall scenario may produce a more severe flood response than another scenario, even if it 
has a lower overall average rainfall intensity over A and ¢, due to interactions with watershed 


features (see Section 3.2 of this paper for discussion and Wright et al., 2014b for analysis). 


We will provide the RainyDay source code, examples, and user documentation upon request, and 
intend to release it under version 3 of the GNU _ General Public License 


(http://www.gnu.org/copyleft/gpl.html) once we have completed sufficient testing and 


documentation. The code is currently not parallelized, but shared-memory parallelization may 
be added in the future. Computational time is determined mainly by the size of the input dataset 
(record length n, input resolution, and geographic size of A and A’), while other factors, such as 
m, t, Tmax, and N can impact runtime. Computational speed, even without parallelization, is not 
prohibitive on a modern desktop or laptop computer (several seconds to several hours for typical 


configurations and input datasets). 


To ensure accessibility for users inexperienced with Python, all of the necessary Python modules 
are supported within recent versions of the Anaconda Python distribution from Continuum 


Analytics (https://store.continuum.io/cshop/anaconda). The user must install NetCDF4 libraries 


and any requisite dependencies. If the user wishes to use shapefile functionality, necessary for 
defining A to be a shape other than a rectangle or a single rainfall pixel, the GDAL library 


(http://www.gdal.org) and any necessary dependencies must also be installed. 
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314 Figure 2: Flow chart demonstrating the workflow of RainyDay. 
315 


316 3.2. SST Internal Variability 

317 In RainyDay, the user specifies N, the number of 7),..-year long “ensemble members” to be 
318 generated. This enables the examination of “internal variability,” i.e. how much variation in 
319 rainfall intensity is possible for a given p. for a given input rainfall dataset and set of user- 


320 defined parameters. For example, if the user specifies Tnax = 10° and N= 100, then there will be 
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100 intensity estimates for each p. between 1.0 and 10°. RainyDay will automatically generate a 
text file containing the results of this rainfall frequency analysis, including the rainfall mean, 
minimum, and maximum (or, optionally, a quantile interval) for each p., computed from the N 


ensemble members. 


If the scenarios generated by RainyDay are fed through a hazard model, then the ensemble 
spread will propagate through to generate ensemble hazard estimates. A useful and interesting 
feature of SST and RainyDay that is not examined in this paper, but is discussed at length in 
Wright et al. (2014b), is that the exceedance probability of rainfall and of subsequent hazards can 
be decoupled using SST, particularly if some realistic scheme is used to account for the initial 
conditions in A (such as soil moisture or baseflow). Consider the example where N=/ and 10° 
rainfall scenarios (Tna=10°) are created as input to a distributed flood hydrologic model. One of 
these rainfall scenarios has pe=0.01 (in terms of watershed-average t-hour rainfall depth over an 
area A). Even if initial conditions are kept constant across all Tix simulations, the p. of the peak 
discharge or volume predicted by the model for this particular scenario need not be equal to 0.01, 
since the space-time structure of the rainfall scenario and its interactions with watershed and 
river network features can dampen or magnify the flood severity. If variability in initial 
conditions within the hazard model are considered, this dampening or magnification effect can 
be even greater. This property of SST contrasts with design storm methods, which typically 
assume a 1:1 relationship between the p, of rainfall and the resulting hazard, though variability in 
initial conditions could in principle be used with design storm approaches to produce some 
degree of “decoupling” of rainfall and hazard p.. Setting N > 2 allows for examination of 
differences in hazard p. for a given rainfall p., or vice versa, which could lead the way to more 
detailed examination of the role of rainfall space-time structure (see Wright et al. 2014b) or 
initial conditions in probabilistic hazard estimation. RainyDay provides one simple scheme for 


creating variability in initial conditions, described in Section 3.5. 


It should be pointed out that the ensemble spread generated in RainyDay is not completely 
comparable to the confidence intervals of more traditional rainfall or flood frequency analyses. 
The latter show statistical uncertainty associated with parameter estimation, which can be 


derived in different ways (e.g. bootstrapping, profile likelihood, etc.). Therefore, it might not be 
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reasonable to expect that the uncertainty ranges produced by RainyDay to be comparable to the 
confidence intervals of other IDF estimates. Like nearly all frequency analyses and IDF 
estimation methods, the ensemble spread generated by RainyDay does not consider measurement 
error, which, as mentioned previously, can be substantial. Since the ensemble spread is for a 
given set of user-defined values such as A’ or m, it does not consider uncertainty associated with 
these choices. Analyses in Section 4.3 show how such uncertainties can be assessed, but 
fundamentally this requires manipulating the size or composition of the storm catalog through 


the choice of user-defined values, necessitating multiple distinct runs of RainyDay. 


Ensemble spread is shown throughout Section 4 to illustrate various aspects of SST-based 
rainfall and flood frequency analysis. If the user is only interested in examining internal 
variability of SST-based rainfall IDF, then the number of ensemble members can be large (e.g. 
N=100). If the user wishes to perform hazard simulations, however, N should be selected with 
consideration of the computational cost associated with large numbers of simulations, which can 
be substantial depending on the particular hazard model. To help manage the number of 
simulations required, the user can specify a rainfall return period threshold, below which output 
scenarios will not be created. For example, if the user specifies a 5-year threshold, no rainfall 
scenarios with a rainfall depth less than the 5-year return period depth will be written, which 
reduces the number of hazard simulations by 80% for a given value of N while still retaining the 


most extreme scenarios. 


3.3 Rainfall Heterogeneity and Non-Uniform Spatial Transposition 

A common criticism of SST is that its validity is restricted to regions with homogenous extreme 
rainfall properties. As previously mentioned, depending on how rigidly this criterion is enforced, 
the method would be limited to small, flat regions far from topographic features, water bodies, 
etc. It is unclear how homogeneity would be determined, particularly with the paucity of extreme 
rainfall data in most regions. Instead, steps can be taken to use SST in more varied geophysical 
settings. Regardless of the setting, the selection of A’ requires an understanding of regional 
rainfall patterns and of the intrinsic assumptions of SST. Though more work is needed to 
understand the geographic limits of the applicability of RainyDay in complex terrain, the work of 


England et al. (2014) provides an example of SST in complex terrain. 
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RainyDay provides several tools to help understand the issue of rainfall heterogeneity, and, to 
some extent, to mitigate it. First, RainyDay produces a map showing the location of the rainfall 
centroids for all storms in the storm catalog, overlaid on a smoothed field of the spatial 
probability of storm occurrence within A’. This spatial probability of occurrence map is 
generated by applying a two-dimensional Gaussian kernel smoother to the (x,y) locations of the 
rainfall centroids for all the storms in the storm catalog. This smoothed field is then normalized 
such that the sum of all grid cells is 1.0, thus creating a two-dimensional probability density 
function of storm occurrence. A second plot shows these rainfall centroids overlaid with the 
average rainfall per storm across A’. These diagnostic plots assist in understanding regional 
variations in storm occurrences and rainfall over A’. Examples of these diagnostic plots for a 
region A’ encompassing most of the state of Iowa in the central United States are shown in 
Figure 3. The top panel suggests that storms are somewhat more frequent in the southernmost 
third or so of the transposition domain (top panel), along with slightly elevated activity in the 
northeast quadrant. The bottom panel shows somewhat higher average storm rainfall in these two 
areas. Caution should be taken when drawing firm conclusions from these diagnostic plots, 
however, since rainfall heterogeneities evident in both storm occurrences and average storm 
rainfall may be the result of spatial biases in rainfall remote sensing estimates or of randomness 
in the climate system over the relatively short remote sensing record, rather than from “true” 


heterogeneity in the underlying rainfall hydroclimate. 


Additional optional diagnostic outputs include static and animated rainfall maps for each storm 
in the storm catalog (not shown). These storm rainfall maps are useful for diagnosing “bad data,” 
particularly in rainfall datasets that use ground-based weather radar contaminated by radar beam 
blockage and other unrealistic artifacts. RainyDay allows for the exclusion of user-identified 
storm periods from subsequent analysis, though anomalous periods must be identified y the user 


(i.e. no automatic data quality checking is provided). 


The two-dimensional density function of spatial storm probability of storm occurrence can 
optionally be used as the basis for non-uniform spatial transposition (providing the D(x) and 


Dy(y) described in Step 4 and Figure 1 in Section 2) so that the spatial distribution of storm 
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occurrences will be preserved between the input data and output rainfall scenarios and IDF 
estimates. Section 4.3 examines the impact of this optional feature on results for the Iowa study 


region, along with potential implications. 


It is important to note that this approach only addresses the spatial heterogeneity of storm 
occurrences, not of spatial variations in the climatology of rainfall intensity (due to topography 
or other factors). For example, if A’ contains both a flat plain and an adjacent mountain range, 
the probability of storm occurrence will vary across A’. This variation will be captured in the 
two-dimensional density function of spatial storm probability and, using the optional non- 
uniform spatial transposition scheme, will be reflected in RainyDay outputs. In this example, 
rainfall intensity from these storms will also vary according to the underlying topography. The 
current transposition scheme in RainyDay cannot explicitly account for this intensity variation. 


This is likely to be a serious constraint in many regions. 
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Figure 3: Example of diagnostic plots produced by RainyDay for 24-hour duration rainfall from the Stage IV 
rainfall dataset (described in Section 4.1) over a region encompassing the state of Iowa, United States. Top: 
shading indicates spatial probability of storm occurrence. Bottom: shading indicates the average rainfall per 
storm from the same storm catalog. Black dots show the rainfall centroids for each storm in the storm 
catalog. Dot size in both panels indicates relative rainfall storm total rainfall depth. Key RainyDay 
parameters: m=150 storms, A’=[40° to 44° N, 90° to 96° W]. A is a single Stage IV rainfall pixel 
(approximately 16 km’), Tnax=1000, and t=24 hours. 


3.4. Empirical Temporal Resampling 

As mentioned in Step 3 of the SST procedure described in Section 2, previous SST work has 
employed the assumption that the annual number of storm counts follows a Poisson distribution, 
which in turn serves as the basis for the temporal resampling of storms (i.e. for generating the 
number of storms per year k that will be spatially transposed). RainyDay supports Poisson-based 


resampling, but also allows the use of an empirical distribution. This distribution is derived from 
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the number of storms that enter into the storm catalog from each calendar year in the rainfall 
input dataset. Then, during the temporal resampling step, k is obtained by randomly selecting one 
of these values. This feature may be useful in regions where storm occurrences exhibit strong 
clustering (i.e. where there is strong evidence for more storms in some years and fewer in other 
years for persistent climatological reasons; e.g., Villarini et al., 2013). Section 4.3 examines the 
impact of this choice on SST results. Other discrete probability distributions, such as the two- 
parameter negative binomial (Pascal) distribution, can also be used to model clustered count 
data. RainyDay does not currently use such distributions, since short (typically 10-20 year) 
remote sensing records may yield poor parameter estimates stemming from the limited number 


of statistical degrees of freedom. 


He) “Spin-up”’ of Initial Conditions 

A key issue in the modeling of rainfall driven hazards is to adequately represent initial 
conditions. In many flood and landslide modeling efforts, the most critical of these initial 
conditions is antecedent soil moisture, while other states such as river baseflow and water table 
position may also be relevant. Many hydrologic models allow for the specification of such initial 
conditions, and thus many design storm-based hazard modeling efforts rely on an assumed soil 
moisture state, such as a typical or fully saturated condition. Such assumed approaches have 
previously been used with SST (Wright et al., 2014b), and could be combined with the rainfall 
scenarios generated via RainyDay. This approach has the downside, however, that the true 
variability antecedent soil moisture is not captured in hazard predictions. This is particularly 
important in regions in which heavy rainfall does not necessarily occur in the same season as 
high soil moisture conditions. A second approach that can capture this variability would be to 
derive a distribution of antecedent soil moisture from previous long-term (ideally multi-decadal) 
model simulations. Since there can be substantial variation in how soil moisture is represented 
in different hazard models, ideally the same model would be used for these long-term 
simulations and for the hazard scenario modeling. RainyDay offers an alternative option, 
however, in which initial soil moisture can be “spun up” within the hazard model to represent 


seasonally realistic initial conditions without the need for long-term simulation. 


The spin-up procedure is described for a single rainfall scenario. The month of occurrence of the 


rainfall scenario is identified based on the “parent storm” that created it. Then RainyDay 
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identifies the set of X-day periods (where X is a user-defined spin-up period) preceding all parent 
storms that occur within a user-defined number of months from the date of occurrence of the 
parent storm. One of those X-day periods is randomly selected and pre-pended to rainfall 
scenario. This scheme helps to ensure that spin-up conditions are reasonable for the given 
season. It also helps ensure that spin-up conditions have realistic temporal correlations when pre- 
pended to the rainfall scenario (for example, if there is a historical tendency for several days of 
moderate rain prior to heavy storms but several days of heavy rain prior to the main storm 
doesn’t have historical precedent, these conditions will be properly represented). It is important 
to note, however, that the 10 to 20-year records typical of rainfall remote sensing records may 


not capture the full variability of “true” initial conditions. 


This pre-pending procedure creates rainfall scenario output files that are of X+¢ day duration. 
The modeler can then assign an average initial soil moisture condition to initialize each model 
run, and use the rainfall scenario as input. Soil moisture within the model will then evolve over 
the spin-up period based on the rainfall (or lack thereof), evapotranspiration, and other model- 
estimated fluxes. It is important to point out that this spin-up procedure has several limitations. 
First, it has a storage and computational cost since it can substantially increase the size of the 
rainfall scenario output files generated by RainyDay and increase the length of each hazard 
simulation. The importance of these limitations depends on the size of A, the resolution of the 
input rainfall dataset, and the computational burden of the hazard model. In Section 4.2, for 
example, we limit X to 6 days, for a total rainfall duration of 10 days. This spin-up period is 
likely sufficient to spin up moisture in the upper soil layers, but not to fully establish baseflow or 
deeper groundwater flow. The modeler should evaluate the tradeoffs between longer X and the 


associated storage and computational costs. 


3.6 | Parametric Rainfall Intensity 

Instead of relying on the rainfall intensity derived from a remote sensing input dataset, a user 
might prefer to use a parametric distribution to impose rainfall depths on the rainfall output 
scenarios. RainyDay supports this option. The user can supply a f¢-hour rainfall depth 
distribution. This distribution is then applied to the output rainfall scenarios via a normalization 
procedure that assumes that the supplied distribution corresponds to the annual maximum f¢-hour 


rainfall intensity for a single rainfall grid cell. Rainfall space-time structure is still derived from 
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the remote sensing data. It should be noted, however, that when the resolution of the input 
remote sensing dataset is coarse relative to the spatial coverage of the rainfall measurement 
device upon which the parametric distribution is based (for example, the 16-625 km? footprint of 
many satellite rainfall datasets relative to the 0.1 m* sampling area of a single rain gage), this 
approach may be problematic. This procedure is also problematic in regions where such 
parametric rainfall distributions might be the synthesis of “mixture distributions” of distinct 
storm types in which rainfall intensity is intrinsically linked to rainfall space-time structure (e.g. 
Smith et al., 2011), since RainyDay does not distinguish between different storm types. 
Currently only the three-parameter generalized extreme value distribution (Walshaw, 2013) is 


supported, though it would be straightforward to add additional choices. 
4. Rainfall and Flood Case Studies 


4.1 Rainfall IDF 

We generated IDF results for six durations from 3 to 96 hours over a range of p. between 0.5 and 
10° using RainyDay for single rainfall grid cells in the vicinity of lowa City, Iowa (Figure 4) 
using rainfall data from Stage IV (Lin and Mitchell, 2005) and version 7.0 of the Tropical 
Rainfall Measurement Mission Multi-Satellite Precipitation Analysis (TMPA; Huffman et al., 
2010). Stage IV is available through the National Weather Service (NWS) National Center for 
Environmental Prediction and provides hourly, 4 km resolution rainfall estimates by merging 
data from the NWS Next-Generation Radar network (NEXRAD; Crum and Alberty, 1993) with 
rain gages and, in some instances, satellite rainfall estimates. Stage IV has been extensively used 
in studies of extreme rainfall and flooding. All Stage IV analyses in this paper use data from 
2002 to 2014. TMPA merges passive microwave, active radar, and infrared observations from 
multiple satellites to create a near-global (+50° latitude) rainfall dataset with 3-hourly, 0.25° 
(approximately 25 km) resolution. Unless otherwise noted, TMPA analyses this study uses the 
final “research version” of TMPA from 1998-2014, which includes a monthly rain gage-based 
bias correction. For the results in Figure 4, and most subsequent analyses in this study, A’ is the 
rectangular area shown in Figure 3. A is set to a single rainfall pixel and each run consists of 100 
ensemble members (N=100), producing 100 estimates for each pe. We compare these results with 


rain gage-based IDFs from NOAA Atlas 14. Atlas 14 uses L-moment regionalization techniques 
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to combine observations from large number of rain gages. The Atlas 14 analysis for Iowa uses 


369 rain gages, many of which have records beginning in the late 19" century. 


The range of IDF durations shown in Figure 4 emphasize that RainyDay is flexible in terms of 
the selection of duration ¢. RainyDay-based IDF estimates using Stage IV exhibit slight 
systematic underestimation relative to Atlas 14 across a range of p. except for at the 96-hour 
duration, where there is a close match. RainyDay-based IDF estimates using TMPA, meanwhile, 
closely match Atlas 14 for high p. (except at the 3-hour scale) and underestimates for low p- for 
all durations. Underestimation using RainyDay may be attributed to the mismatch in spatial 
resolution of the remote sensing data (approximately 16 km? for Stage IV and 625 km° for 
TMPA) and the rain gages (approximately 0.1 m’). We have refrained from using ARFs to 
convert the Atlas 14 point IDF estimates into area-averaged IDFs, since the ARF concept has 
practical and conceptual limitations (see Wright et al., 2014a). Both the slight overestimation of 
rainfall depth from TMPA (relative to Stage IV) for more frequent events, and the 
underestimation for more rare events using both datasets, could potentially be explained by 
conditional bias (i.e. bias that is dependent on rain-rate; Ciach et al. 2000, see Habib et al., 2009 
for evidence of conditional biases in TMPA). The convergence between Stage IV-based 
RainyDay IDFs and Atlas 14 with increasing duration is consistent with both conditional bias 
and spatial mismatch effects, both of which are known to diminish with increased temporal 
aggregation. While not definitive, the results in Figure 4 do not clearly point to shortcomings 


associated with the SST procedure itself. 


In order to highlight both the potential for IDF estimation and probabilistic hazard assessment in 
data-sparse regions using RainyDay with satellite remote sensing, and some of the associated 
challenges, we compare 24-hour IDF curves generated using RainyDay for various satellite 
rainfall datasets for the vicinity of lowa City (Figure 5). This comparison includes two versions 
of TMPA: the aforementioned final version which includes monthly rain gage-based bias 
correction, and TMPA-RT, which is produced in near real-time, does not feature bias correction, 
and runs from 2000-2014. It also includes two versions of the 30-minute resolution, 8 km 
Climate Prediction Center (CPC) Morphing Technique (CMORPH; Joyce et al., 2004): 
CMORPH Corrected, which uses a daily rain gage-based bias correction scheme, and CMORPH 
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Raw, which does not. Finally, it includes the 60-minute, approximately 4 km version of 
Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks 
Global Cloud Classification System (PERSIANN-GCCS; Sorooshian et al., 2000), which does 
not use gage-based bias correction. The results in the top panel of Figure 5 show relatively good 
agreement between point-scale NOAA Atlas 14 IDFs and single-pixel RainyDay-based IDFs for 
bias-corrected TMPA and PERSIANN-GCCS, particularly considering the spatial sampling 
mismatch between the remote sensing data and Atlas 14 mentioned previously, while results 


based on CMORPH Corrected show systematic underestimation. 


The middle panel of Figure 5 shows how RainyDay can be used to examine the effect of rain 
gage-based bias correction on satellite-based IDF estimates. In the case of CMORPH, the Raw 
version overestimates rainfall intensity at all p., while results for the Corrected version shows 
that the daily-scale bias correction scheme seems to overcompensate, leading to systematic 
underestimation. The TMPA-RT also overestimates at all p., though not as severely as 
CMORPH Raw, while the monthly bias correction scheme used in the final version of TMPA 
appears to offer superior performance to the daily-scale routine used by CMORPH Corrected. It 
is not immediately clear why this is the case, particularly since details of the bias correction 
procedure for CMORPH are not readily available, but relevant considerations include the effect 
of rainfall detection errors on bias correction (Tian et al., 2007) and the challenge of correcting 
for conditional biases at short time scales (Wright et al., 2014c). The apparent strong 
performance of the monthly bias correction is encouraging in the context of Integrated Multi- 
satellitE Retrievals for GPM (IMERG), a state-of-the-art rainfall dataset that combines various 
elements from TMPA, CMORPH, and PERSIANN, including TMPA’s monthly bias correction 
(Huffman et al., 2014). The IMERG dataset is not analyzed in this study since the full 


retrospective dataset is not yet available. 


The bottom panel of Figure 5 shows results similar to those in the top panel, but with A set to a 
0.5° by 0.5° (approximately 2500 km’) box centered on Iowa City. The results demonstrate that 
RainyDay can easily generate spatially aggregated rainfall IDF curves. This is not achievable 


using standard gage-based IDF curves without the use of ARFs, which, as previously mentioned, 
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have been shown to have limitations. We omit an area-averaged gage-based IDF curve from the 


bottom panel of Figure 5 for this reason. 


The results shown in Figure 5 (and also Figure 4) have implications for using RainyDay for IDF 
and hazard estimation in data-sparse regions using satellite remote sensing. First, there can be 
substantial differences in extreme rainfall estimates between satellite rainfall datasets, and these 
differences will propagate through to IDF estimates (and to probabilistic hazard estimates, as will 
be shown in Section 4.2). Furthermore, while comparison with gage-based IDFs (when 
available) can be used to understand these differences, spatial sampling mismatches complicate 
comparisons. Findings may not be transferable across regions since the performance of satellite 
rainfall retrievals vary with region and latitude (e.g. Ebert et al., 2007) and because the quality of 
the gage-based bias correction schemes that some of satellite datasets employ will vary 


regionally with the density of rain gage observations that are available. 


23 


608 


Rainfall [mm] 


400 73-hour duration 


RainyDay mean and ensemble spread 
with Stage IV rainfall 

RainyDay mean and ensemble spread 
with TMPA rainfall 


300 


NWS Atlas 14 IDF with 


eee qi confidence interval 


100 


0.1 0.05 0.02 0.010.005 0.0020.001 
2 5 10 20 50 100 200 500 1000 


400 


42-hour duration 


300 


200 


100 


0.5 0.2 0.1 0.05 0.02 0.010.005 0.0020.001 
2 5 10 20 50 100 200 500 1000 


500 7 48-hour duration 


400 


300 


200 


05 0.2 0.1 0.05 0.02 0.010.005 0.0020.001 


2 5 10 20 50 100 200 500 1000 


24 


400 


6-hour duration 


300 


200 


100 


05 02 0.1 0.05 0.02 0.010.005 0.0020.001 
2 5 10 20 50 100 200 500 1000 


400 


24-hour duration 


300 


200 


100 


0.5 02 0.1 0.05 0.02 0.010.005 0.0020.001 
2 5 10 20 50 100 200 500 1000 


500 


96-hour duration 


400 


300 


200 


100 


05 0.2 0.1 0.05 0.02 0.010.005 0.0020.001 


2 5 10 20 50 100 200 
Exceedance Probability [-] 


Return Period [y] 


500 1000 


609 
610 
611 
612 
613 
614 
615 


616 
617 
618 
619 
620 
621 
622 
623 
624 
625 


Figure 4: Comparison of IDF curves from Atlas 14 and RainyDay using the Stage [TV and TMPA rainfall 
datasets for 3-, 6-, 12-, 24-, 48-, and 96-hour durations. Shaded areas for RainyDay estimates denote the 
ensemble spread. Bars on the NOAA Atlas 14 IDF estimates denote the 90% confidence intervals. Key 
RainyDay parameters: m=150 storms, A’=[40° to 44° N, 90 ° to 96° W].=, A is a single rainfall pixel 
(approximately 16 km’ for Stage IV, 625 km? for TMPA), N=100, Tinax=1000. Spatially-uniform transposition 
and Poisson-based temporal resampling are selected. Stage IV period of record is 2002-2014, TMPA period of 


record is 1998-2014. Analyses are restricted to April-November period. 
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Figure 5: Comparison of IDF curves. Top: 24-hour duration IDF curves at the point scale from NOAA Atlas 
14 and at the pixel scale from RainyDay using TMPA Final, PERSIANN-GCCS, and CMORPH Corrected 
rainfall datasets. Middle: 24-hour duration IDF curves at the point scale from NOAA Atlas 14 and at the 
pixel scale from RainyDay using TMPA-RT, TMPA Final, CMORPH Raw, and CMORPH Corrected rainfall 
datasets. Bottom: 24-hour duration IDF curves at the 0.5° by 0.5° scale from RainyDay using TMPA, 
PERSIANN-GCCS, and CMORPH Corrected rainfall datasets. Shaded areas for RainyDay estimates denote 
ensemble spread. Bars on the NOAA Atlas 14 IDF estimates denote the 90% confidence intervals. Key 
RainyDay parameters: m=150 storms, A’=[40° to 44° N, 90 ° to 96° W]. A is a single Stage IV rainfall pixel 
(approximately 625 km’ for TMPA, 64 km’ for CMORPH, 16 km’ for PERSIANN), N=100, Tya.=1000, =24 
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hours . Spatially-uniform transposition and Poisson-based temporal resampling are selected. TMPA Final 
and CMORPH period of record is 1998-2014, TMPA RT period is 2000-2014, PERSIANN GCCS period of 
record is 2004-2014. Analyses are restricted to April-November period. 


4.2. Flood Frequency Analysis 

In this section, we present flood peak frequency analyses for the 4400 km? Turkey River 
watershed in northeastern Iowa using rainfall scenarios from RainyDay as inputs to the Iowa 
Flood Center (IFC) Model, a calibration-free distributed hydrologic modeling framework 
designed primarily for multi-scale flood research and application (see Cunha et al., 2012; Demir 
and Krajewski, 2013; Mantilla and Gupta, 2005; Moser et al., 2015; Small et al., 2013). Moser et 
al. (2015) provides a detailed model description and Cunha et al. (2012) performed model 
validations for flood events in Iowa, showing that the performance of the IFC Model is generally 
comparable to that of the more heavily-calibrated operational NWS SAC-SMA flood forecast 
model (Burnash, 1995). The model configuration used here is the same that was used by Moser 
et al (2015). This study aims only to demonstrate basic features of RainyDay for flood hazard 
analysis and so does not provide detailed discussion of the IFC Model or comparisons with other 
available platforms. For a discussion of the value of calibration-free, distributed hydrologic 
models for multi-scale flood modeling, the reader is directed to Wright et al. (2014b) and, in 
particular, Cunha et al. (2012). The full multi-scale hazard estimation capabilities of SST and 
RainyDay can, in principle, be harnessed using any distributed hydrologic or mass wasting 


model, while some of the capabilities can be achieved through the use of lumped models. 


A limited set of model hydrograph validation is provided in Figure 6 for the 2008 and 2014 
April-July periods, during which major flooding occurred throughout Iowa (see Smith et al., 
2013 for a detailed examination of the hydrometeorology of the 2008 flood season). The model 
is run both with Stage IV and the final (gage-corrected) version of TMPA rainfall. Comparisons 
with U.S. Geological Survey (USGS) stream gage observations are provided at four locations, 
with upstream drainage areas ranging from 900 to 4000 km’. All hydrographs are normalized by 
the median annual flood (p.=0.5) to facilitate comparison across watershed scales. Median 
annual flood estimates are taken from the USGS _— StreamStats system 


(http://water.usgs.gov/osw/streamstats/). Model performance varies from event to event but there 
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is no clear evidence of systematic bias in the streamflow predictions as a function of event 
magnitude or drainage area. Predictions based on Stage IV are generally better than TMPA, and 
in fact several time periods show serious problems with the timing of TMPA-based simulations. 
In the 2008 flood season, TMPA incorrectly identifies the late April event as the largest for that 


year, rather than the early June floods. 
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Figure 6: IFC model validation for 2008 and 2014 flood seasons (left and right panels, respectively) at four 
USGS stream gaging sites. Hydrographs are normalized by the median annual flood, which is indicated by 


dashed horizontal lines. 


We compare observed and simulated flood peaks for the 2008-2014 April-November period 
(Figure 7). All observed flood peaks that exceed 100 m? s” are extracted from the four USGS 
stream gaging records. Then the corresponding flood peaks predicted by the IFC model are 
extracted from simulated hydrographs based on Stage IV and TMPA rainfall (left panel and right 
panels of Figure 7, respectively). To allow for modest errors in flood peak timing, a window of 
48 hours centered around the observed peak is used to identify the corresponding simulated 


peaks. All peaks in Figure 7 are normalized by the median annual flood for to facilitate 
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comparison across basin scales. As a rule of thumb, peaks below the median annual flood can be 
considered “within bank,” while peaks above the median annual flood can be considered “out-of- 
bank,” meaning the flood magnitude is large enough to exceed the normal confines of the river 
channel and spill into the floodplain. The left panel of Figure 7 shows that, while there is modest 
scatter in the Stage IV-based flood peak simulations, there is no obvious systematic bias with 
watershed scale or event magnitude. The TMPA-based simulations in the right panel of Figure 7 
exhibit greater scatter, generally poor performance, and show some low bias across a range of 
event magnitudes. While not an exhaustive, the validation shown in Figures 6 and 7 suggests that 
streamflow prediction accuracy in the IFC model is driven primarily by the accuracy of the input 
rainfall rather than by model structure, consistent with Cunha et al. (2012), and that the limited 
accuracy of satellite rainfall inputs, even with gage-based bias correction, can translate into 


relatively poor model performance. 
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Figure 7: Peak discharge validation for 2008-2014 April-November period at four USGS stream gaging 
stations. All events for which the USGS observations exceeded 100 m°* s"' are shown, and peak discharges are 
normalized by the median annual flood. Simulated peaks using the IFC model with Stage TV (TMPA Final) 
rainfall inputs are compared with USGS observed peaks in the left (right) panel. Straight black lines indicate 
1:1 correspondence, while dashed lines indicate the envelope within which the modeled values are within 50% 
of observed. Grey boxes in the lower lefthand corners of each panel highlight all events less than the median 


annual flood. 
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We performed IFC model simulations using RainyDay rainfall scenarios developed from both 
the Stage IV and final gage-corrected TMPA rainfall datasets. For each rainfall dataset, we ran 
ten ensemble members (i.e. N=10), each consisting of 500 rainfall scenarios (1.e. Ting=500). At 
any point along the modeled river system, therefore, flood peak exceedance probabilities as low 
as 0.002 (500-year return period) could be directly derived from the IFC Model predictions. The 
Stage IV and TMPA-based storm catalogs for the Turkey River include 150 storms, drawn from 
the April-November rainfall record (2002-2014 for Stage IV, 1998-2014 for TMPA). 4’ is an 
area covering most of Iowa, southwestern Wisconsin, and southeastern Minnesota in the United 


States. t= 96 hours for all simulations in this section. 


We initialize each simulation with a spatially uniform initial soil moisture value found to be 
typical for the region. Rainfall from a seasonally-based six-day “spin-up” period was then 
prepended to each 96-hour storm period as per Section 3.5, for a total rainfall input time period 
of ten days. Spatial variations in both soil moisture and river flow were therefore allowed to 
develop in each simulation prior to the arrival of the main storm. It should be noted that 
restricting the rainfall record to April-November, in addition to the lack of snowfall functionality 
in RainyDay and snowpack functionality in the IFC model, means that snowmelt-driven flooding 
is not considered in the analyses. In Iowa, snowmelt is generally a minor though non-negligible 
flood mechanism. We do not evaluate the accuracy of these spin-up soil moisture and river flow, 
and in fact such evaluation is relatively challenging due to the paucity of long-term soil moisture 
observation records that would be needed to correlate with river flow. As discussed in Section 
3.2, SST and RainyDay facilitates “decoupling” of discharge p. from rainfall p.. Though not 
demonstrated explicitly, this decoupling is reflected in the RainyDay-based frequency analyses 
in this section, in that the role of spun-up initial conditions and rainfall space-time structures can 


produce discharge p- that are different from the p- of the input rainfall scenarios. 


RainyDay-based frequency analysis results are shown for five subwatersheds of the Turkey 
River, ranging in drainage area from approximately 460 to 4000 km? (Figure 8). Also included in 
Figure 8 are two types of frequency analyses derived from USGS stream gage observations and 
taken from Eash et al. (2013) and retrieved from the USGS StreamStats system. The first is 
developed using standardized methods described in Bulletin 17B (Interagency Advisory 
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Committee on Water Data, 1982) using the log-Pearson Type III distribution (henceforth referred 
to as the LP3 distribution) with a regionalized skew coefficient. The second is based on regional 
regression equations that consider drainage basin area and shape as well as some soil and 
geological properties. Eash et al. (2013) report 121 years of data for Turkey River at Garber, near 
Eldorado, and above French Hollow, while 63 years are reported for Volga River at Littleport 
and 45 years for Turkey River at Spillville. It should be noted that these record lengths refer to 
“historic record length” described in Section V.B.10 of Bulletin 17B and do not correspond to 
length of the USGS annual maxima streamflow timeseries available on the USGS National 


Water Information System (http://nwis.waterdata.usgs.gov/nwis), which are much shorter. All 


available USGS streamflow observations for the five sites are also shown, where p- is estimated 
using the Cunnane plotting position (Cunnane, 1978; p.'= [i - 0.4] / [X + 0.2], where i is the rank 
of the observation and X is the number of observations). Other common plotting position 


formulae produce similar results (not shown) and do not alter the conclusions that follow. 


For all five locations shown in Figure 8, the SST-based peak discharge estimates using TMPA 
are higher than those using Stage IV for p-<0.01, generally converging toward the Stage IV 
estimates as p. decreases, and in some cases yielding lower estimates for p, less than about 0.005. 
This is consistent with the rainfall IDF results from RainyDay shown in Figure 4 and are 
suggestive of conditional biases in the TMPA dataset. This in indeed confirmed in Figure 9, 
which shows watershed-specific IDF curves for the entire Turkey River watershed from 
RainyDay using TMPA and Stage IV. The USGS streamflow observations shown in Figure 8 
agree reasonably well with the Stage IV-based estimates for p>0.5, with the exception of the 
smallest subwatershed, Turkey River at Spillville, where Stage IV produces low peak estimates. 
For p.<0.5, there is a lack of consistency. For example, Turkey River at Garber shows higher 
estimates from Stage IV than the streamflow observations, while the reverse is true for Turkey 
River at French Hollow and near Eldorado. Deviations from the USGS observations do not show 


a systematic scale dependency. 
Both RainyDay-based frequency analyses and the USGS streamflow observations are generally 


higher than the USGS frequency analyses for p. less than about 0.2. One exception is the set of 
USGS observations for Turkey River at Spillville, which are lower than both the RainyDay 
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estimates and the regional regression but generally consistent with the Bulletin 17B analysis. The 
regional regression results for Turkey River at Spillville are greater than the USGS regionalized 
LP3 estimates, while the reverse is true for the four larger subwatersheds. Interestingly, some of 
the USGS observations fall outside of the 90% confidence intervals of the LP3 analyses for 
Turkey River near Eldorado, Volga River at Littleport, and Turkey River at Garber. In the case 
of the latter station, the five most intense floods are near or above the upper 95% confidence 


bound, a finding that is explored in more detail in the following paragraphs. 


It should be noted that with the exception of Turkey River at Garber, the differences between the 
RainyDay-based analyses are roughly similar in magnitude to the differences between the two 
different USGS approaches. This fact, along with the underestimation shown by USGS 
frequency analyses relative to the USGS peak discharge observations at several sites, suggests 
that the RainyDay-based frequency analyses should not be dismissed out of hand as being too 
high for low p-. In fact, as the next example shows, there is observational evidence that supports 
the validity of the RainyDay-based results in light of possible nonstationarity in flooding. It 
should be noted that discharge-based frequency analyses, even in stationary situations with long 
records, are not necessarily superior to hydrologic modeling methods. Analyses by Smith et al. 
(2013) suggest that peak discharge measurement errors may be substantial for a recent major 
flood events in Iowa. The propagation of discharge measurement errors through frequency 
analysis is poorly understood (e.g., Petersen-Overleir and Reitan, 2009; Petersen-Overleir, 2004; 
Potter and Walker, 1985). Rogger et al. (2012) reported significant differences between two 
commonly-used flood frequency analysis approaches for ten small alpine watersheds in Austria, 
one based on a stream gage-based statistical method and the other on design storm methods 
combined with a hydrologic model. The latter method produced higher discharge values than the 
former, and the authors discuss possible explanations and deficiencies in both approaches while 
concluding that in at least some situations, hydrologic modeling using rainfall inputs will 


produce superior results. 
Of the five USGS stream gage locations shown in Figure 8, only the gage at Garber, Iowa has a 


long (82-year), unbroken annual peak discharge record. We use this record to better understand 


the discrepancies between the RainyDay-based results and the USGS frequency analyses from 
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Eash et al. (2013), and in particular to contrast the methods in the context of potential 
nonstationarity in flood processes. The top panel of Figure 10 shows that the same RainyDay and 
USGS frequency analyses shown in Figure 8 for Turkey River at Garber. In this case, however, 
the USGS observations have been divided into two groups; one for all peaks occurring from 
1933 to 1989, and the second for all peaks occurring from 1990 to 2014. The plotting position- 
based p- is recalculated for each group of observations. The 1933-1989 subgroup shows higher 
discharges than either RainyDay Stage IV or USGS discharges for p.>0.5, and lower discharges 
for pe less then about 0.2. The 1990-2014 subgroup, meanwhile, matches closely with the 
RainyDay-based frequency analyses with Stage IV. 
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Figure 8: Peak discharge analyses using RainyDay with Stage IV and TMPA rainfall remote sensing data 
and the IFC Model, compared against USGS stream gage-based analyses for five subwatersheds of the 
Turkey River in northeastern Iowa. Shaded areas for RainyDay estimates denote the ensemble spread. Bars 
on the USGS Bulletin 17B estimates denote the 90% confidence intervals. Confidence intervals are not 
available for the USGS regional regression. Key RainyDay parameters: m=150 storms, A ’=[40° to 44° N, 90 ° 
to 96° W], A is the watershed upstream of the USGS streamgage at Garber, IA, N=10, Ting=500, H=96 hours. 
Spatially-uniform transposition and Poisson-based temporal resampling are selected. Stage IV period of 
record is 2002-2014, TMPA period of record is 1998-2014. RainyDay Analyses are restricted to April- 


November period. 


Taken together, this suggests a regime shift toward more extreme flooding since 1990 and a 
reduction in the magnitude of more average floods. Evidence of this regime shift can be seen in 
the annual peak time series in the bottom panel of Figure 10. We fit a nonparametric linear 
regression to the 1933-2014 time series using the nonparametric Theil-Sen estimator (Sen, 1968) 
and a statistically significant (p-value<0.05) downward trend was found. In contrast, using 
ordinary least squares, an insignificant upward trend is found over the same period. Thus when 
the influence of the most extreme values is minimized through nonparametric statistical methods, 
there is a tendency toward smaller flood peaks over time that is not evident with parametric 


methods, which are more sensitive to the recent extremes. 


The top panel of Figure 10 shows that the period of apparent elevated flood activity is well 
captured by RainyDay, while the preceding period is not, presumably because the IFC model 
reflects recent land use changes and because the input rainfall data are relatively recent. In 
general, whether or not this constitutes a strength or limitation of RainyDay depends on the 
underlying causation of nonstationary flood activity. If flood nonstationarity results from a 
climate-driven secular trend in extreme rainfall, then the results from RainyDay using relatively 


‘ 


short and recent rainfall remote sensing records should be understood as more “up-to-date” 
estimates of flood frequency compared to approaches, such as the USGS analyses, that use 
longer stream gage or rain gage records. The same is true if there is a secular trend in flooding 
due to urbanization or other land-use changes, so long as these changes are properly incorporated 
into the hydrologic model. In the case of Iowa, flooding has been shown to be affected by land- 
use change (Villarini and Strong, 2014) and by climate change (Mallakpour and Villarini, 2015). 


If, on the other hand, flood or rainfall nonstationarity has a periodic structure due to a slowly- 
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varying climate mode, then the results from SST may only adequately reflect the true flood 
frequency for the phase of the mode that overlaps with the remote sensing record. It should also 
be recognized that a period of higher or lower flood activity at a particular location could result 
from pure randomness (i.e. in absence of both secular and periodic trends). SST should be 


relatively robust to this possibility through the sampling storms from a larger region. 
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Figure 9: IDF analyses for Turkey River using RainyDay with Stage [TV and TMPA rainfall remote sensing 
data. Shaded areas for RainyDay estimates denote the ensemble spread. Key RainyDay parameters: m=150 
storms, A’ = [40° to 44° N, 90 ° to 96° W], A is the 4400 km? watershed upstream of the confluence with the 
Mississippi River. N=100, 7,,.,.=500, 96 hours, and spatially-uniform transposition and Poisson-based 
temporal resampling are selected. Stage IV period of record is 2002-2014, TMPA period of record is 1998- 
2014. Analyses are restricted to April-November period. 
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Figure 10: Top panel—four peak discharge analyses for the location of the USGS stream gage at Garber, TA: 
RainyDay with Stage IV and TMPA rainfall and USGS frequency analyses using regional regression 
relationships and Bulletin 17B methods. Shaded areas for RainyDay estimates denote the ensemble spread. 
Bars for the Bulletin 17B-based analysis denote the 90% confidence intervals. Confidence intervals are not 
available for the USGS regional regression. Bottom panel—annual peak discharge time series for 1932-2014 
for the Garber gage. Linear trend lines in the bottom panel use non-parametric Thiel-Sen regression (Sen, 
1968) and ordinary least squares (OLS). Key RainyDay parameters: m=150 storms, A’ = [40° to 44° N, 90 ° to 
96° W], A is the watershed upstream of the USGS streamgage at Garber, IA, N=10, Ti~,=500, 96 hours. 
Spatially-uniform transposition and Poisson-based temporal resampling are selected. Stage IV period of 
record is 2002-2014, TMPA period of record is 1998-2014. RainyDay Analyses are restricted to April- 


November period. 
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4.3 SST Sensitivity to Record Length and User-defined Parameters 

In this section, we examine the sensitivity of SST to the length of the input dataset and to 
different user-defined parameters and options introduced in Sections 2 and 3. Specific topics that 
are examined include the optional non-uniform spatial transposition (Section 3.3), empirically- 
based temporal resampling (Section 3.4) and the size of the transposition domain A’. In all cases, 
it should be kept in mind that the specific results pertain to the Iowa study area and may not be 
generalizable to other locations. The intention is to demonstrate some important concepts and 
pitfalls associated with RainyDay, and provide a possible framework for assessing performance 


in different locations and applications. 


The core concept behind SST is “space-for-time substitution,” in which storms over a larger 
region help to inform estimates of rare rainfall in a particular subregion. A common critique of 
coupling SST with rainfall remote sensing datasets is that such data records are relatively short 
(approximately 10 to 20 years at time of writing) and thus may not contain sufficient numbers of 
extreme events at the regional scale to leverage this substitution property and accurately recreate 
the properties of rare rainfall events. To examine this critique, we turn to a longer dataset: CPC- 
Unified, a daily rain gage-based gridded rainfall dataset that has a spatial resolution of 0.25° over 
the conterminous United States (Chen et al., 2008; Xie et al., 2007). Though the spatial and 
temporal resolution of CPC-Unified is generally insufficient for fine-scale flood modeling, its 
long record—1948 to present—makes it ideal for evaluating the sensitivity of SST-based IDF 
estimates to record lengthWe examined several stationarity measures over the transposition 
domain A’ (which, as in Section 4.1, roughly encompasses the state of Iowa), including the 
average number of storm counts per year and the mean, median, and standard deviation of storm 
rainfall depth (results not shown). None of these measures revealed significant temporal trends, 
generally consistent with Villarini et al. (2011). This may contradict the apparent flood 
nonstationarity in the Turkey River watershed discussed in Section 4.2, or may point to land-use 
change as the predominant source of non-stationarity in Turkey River, but rigorous examination 


is beyond the scope of this paper. 


We use a bootstrapping approach to examine variability in IDF estimates derived from the CPC- 


Unified data using RainyDay and how this variability evolves as the length of the record 
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increases. All IDF estimates in this section are for 1-day rainfall over averaged over a 0.5° by 
0.5° box. We generate n-year long input rainfall datasets by randomly selecting n years of CPC- 
Unified data without replacement from the 1948-2014 period. Each of these datasets is then used 
as the basis for a single run of RainyDay with 100 ensemble members and with m = 10n (leading 
to A=10 storms per year). We repeat this procedure to create 25 datasets for each value of n = 10, 


20, 30, 40, 50 years. 
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Figure 11: The effect of the rainfall record length on daily rainfall IDF curves estimated using RainyDay with 
the CPC-Unified daily rainfall over Iowa, United States. Each panel shows the ensemble mean (solid lines) for 
ten independent runs of RainyDay. The shaded areas denote the maximum spread across the ten runs. Key 
RainyDay parameters: m=10x storms (where x varies by specified record length), A ’=[40° to 44° N, 90 ° to 96° 
W], A is a 0.5° by 0.5° box, N=100, 7<,=1000, 1 day, spatially-uniform transposition and Poisson-based 


temporal resampling. Analyses are restricted to April-November period. 


Substantially more variability is evident in the ensemble mean and spread of the IDF estimates 


using 10 years of CPC-Unified data than using 20 years, while change in variability is negligible 
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between runs using 20 years and 30 years of data (Figure 11). We also examined the variability 
of relative deviations in the ensemble IDF means, minima, and maxima from RainyDay between 
the n-year runs and IDFs based on the full 67-year dataset (Figure 12). The boxplots show that 
the majority of the deviations in the n-year IDF ensemble means, minima, and maxima are less 
than 10% and that the vast majority are less than 20% for any given p.. For most pe, there are 
substantial reductions in deviation when the records increase in length from n = 10 to n = 20 


years. The reductions in deviation are less when the record length increases beyond 20 years. 
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Figure 12: The effect of rainfall record length on variability in daily rainfall IDF estimated using RainyDay 
with CPC-Unified data over Iowa, United States for 0.1, 0.05, 0.01, 0.005, and 0.001 exceedance probabilities. 
Each boxplot shows the variability of a particular rainfall quantity at a given exceedance probability across 
25 independent runs of RainyDay. Specific rainfall quantities shown are the ensemble mean (top panel), 
ensemble maximum (middle panel), and ensemble minimum (bottom panel). Boxes denote the lower and 
upper quartiles and whiskers indicate the extent of the +/-1.5 interquartile range. Key RainyDay parameters: 
m=10n storms (where x varies by specified record length), A ’=[40° to 44° N, 90 ° to 96° W], A is a 0.5° by 0.5° 
box, N=100, Ting=1000, =1 day, spatially-uniform transposition and Poisson-based temporal resampling. 


Analyses are restricted to April-November period. 
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Unless the intensity of the rainfall inputs is perturbed stochastically, SST-based frequency 
analyses have an inherent upper bound. This upper bound corresponds to the most intense 
rainstorm in the storm catalog transposed in such a way that rainfall over A is maximized. The 
lack of positive deviations in the ensemble maxima at p.= 10° (middle panel of Figure 12; also 
in certain realizations shown in Figure 11) show where the SST procedure “encounters” this 


upper limit. 


While the results in this section are by no means exhaustive and the conclusions are specific to 
the Iowa study region and could vary in different physiographic regions, they nonetheless 
suggest that concerns over the use of relatively short remote sensing records with SST may be 
overstated and that remote sensing datasets, many of which are approaching 20 years in length, 
should provide relatively robust estimates that will improve as these datasets continue to grow in 
length. This emphasizes the fact that rainfall events that would be considered rare from the 
perspective of a single location or watershed can occur relatively frequently from a regional 
perspective. This is qualitatively consistent with the findings of Troutman and Karlinger (2003), 
who estimate that a flood with p.>107 occurs on average every 4.5 years at at least one of the 


193 USGS stream gage sites in their Puget Sound study region. 


A potentially important issue related to short data records in SST, previously mentioned in 
Section 3.3, can arise if, instead of assuming that the probability of storm occurrence is uniform 
across the transposition domain, non-uniform spatial transposition is used instead (such as the 
approach used in Wilson and Foufoula-Georgiou, 1990 or the optional scheme in RainyDay 
described in Section 3.3). Using the bootstrapping approach with the CPC-Unified dataset 
described above, visual inspection of storm probability-of-occurrence maps such as the one 
shown in Figure 3 reveal that there can be substantial variations in the spatial distribution of 
historical storms when rainfall records are short (results not shown). These variations tend to 
diminish as the length of record increases, as do their impacts on IDF estimates. More variation 
is evident in the median IDFs from ten independent runs of RainyDay, for example, using non- 
uniform transposition than using uniform transposition when n=10 years (Figure 13, left panels). 
When using non-uniform transposition, variability diminishes when n=20 years and a systematic 


increase in rainfall intensity for p.>0.02, relative to the uniform transposition case, emerges 
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(Figure 13, right panels). Given these results, we recommend that the assumption of uniform 
transposition be used in the absence of strong physically-based reasoning and observational 
support for non-uniform transposition. It is possible, however, that this explains the IDF 
underestimation by RainyDay with Stage IV for high p, relative to Atlas 14 shown in Figure 4, 


where uniform spatial transposition was used. 
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Figure 13: The effect of the spatial transposition scheme on daily rainfall IDF curves estimated using 
RainyDay with the CPC-Unified daily rainfall over Iowa, United States. Each panel shows the ensemble mean 
(solid lines) for ten independent runs of RainyDay. The shaded areas denote the maximum spread across the 
ten runs. The specific years that comprise the input dataset vary. Key RainyDay parameters: m=10n storms 
(where x varies by specified record length), A ’=[40° to 44° N, 90 ° to 96° W], A is a 0.5° by 0.5° box, N=100, 
Tnax=1000, tH=1 day. Poisson-based temporal resampling is used. Analyses are restricted to April-November 


period. 


As mentioned previously, RainyDay supports either the Poisson-based resampling that has 
traditionally been used with SST, or an empirical scheme described in Section 3.4. There do not 
appear to be substantial systematic differences between the results from RainyDay using these 
two schemes with 10-year records (Figure 14, left panels), but, similar to Figure 13, when 20- 


year records are used, there is a tendency toward higher rainfall estimates for p.>0.02. Results 


41 


976 
977 
978 
979 
980 
98 | 


982 
983 
984 
985 
986 
987 
988 
989 
990 
991 
992 
993 


994 


may differ in other regions, particularly where temporal clustering of storms is very strong or 
where rainstorms are very infrequent. It is recommended that the modeler assess clustering using 
an independent long-term rainfall data source if available, in addition to assessing sensitivity to 
this option in RainyDay. As with the spatial transposition schemes, the choice of temporal 


resampling scheme does not appear to have a substantial impact on low p- estimates. 
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Figure 14: The effect of the temporal resampling scheme on daily rainfall IDF curves estimated using 
RainyDay with the CPC-Unified daily rainfall over Iowa, United States. Each panel shows the ensemble mean 
(solid lines) for ten independent runs of RainyDay. The shaded areas denote the maximum spread across the 
ten runs. The specific years that comprise the input dataset vary. Key RainyDay parameters: m=10n storms 
(where n varies by specified record length), A ’=[40° to 44° N, 90 ° to 96° W], A is a 0.5° by 0.5° box, N=100, 
Tnax=1000, #=1 day. Spatially uniform transposition is used. Analyses are restricted to April-November 


period. 


We also examine the sensitivity of RainyDay results to the size of A’ (Figure 15). To do so, we 
run RainyDay for various square domains ranging from 1° by 1° up to 10° by 10°, while holding 
A fixed at a 0.5° by 0.5° box. Then the evolution of rainfall intensity is examined for a range of 


Pe as a function of A’. This is repeated for a several different record lengths and for two values of 


4? 


995 

996 

997 

998 

999 
1000 
1001 
1002 
1003 
1004 
1005 
1006 
1007 
1008 
1009 
1010 
1011 
1012 
1013 
1014 


i. Interestingly, while there is a general tendency for intensity estimates to stabilize as A’ grows, 
the behavior is not asymptotic (though roughly so for n=68 years). The high exceedance 
probability estimates (p-=0.5) tend to be stable over a large range of A’ and then descrease for 
very large values, due to the tendency for synthetic years to be created in which no storm is 
transposed directly over A. This is the root of potential low biases mentioned in Step 2 of the 
SST procedure described in Section 2. However, Figure 15 demonstrates that this tendency for a 
decrease in intensity estimates for large A’ extends to smaller p. values as well, and that there is a 
critical value of A’ at which the estimated intensity is roughly maximized. This critical value 
appears to vary more by the particular period of record than by the length of record. For 
example, the 20-year record from 1976-1995 yielded a critical value of A’ that is lower than the 
critical value from 20-year record from 1996-2015. This points to the fact that the existence and 
number of major storms within A’ during the record period is very important (Wright et al., 


2014b reached the same conclusion through different means). 


These results also indicate that increasing m (thus increasing 4) can mitigate the reduction in 
estimated intensity for values of A’ larger than the critical value. This result suggests that, if the 
modeler is interested in hazard estimation across a range of p., he or she should choose a 
relatively large m. A diagnostic framework within the RainyDay software to identify this critical 
value of A’ for a given value of m (or vice versa) for different p. would be useful but does not 


currently exist. 
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Figure 15: The effect of the size of the transposition domain A’ on daily rainfall IDF curves estimated using 
RainyDay with the CPC-Unified daily rainfall over Iowa, United States using a range of record lengths. Key 
RainyDay parameters: m=10n storms (where n varies by specified record length), A’ is a square of varying 
size, A is a 0.5° by 0.5° box, N=100, Tina=1000, 1 day, spatially-uniform transposition and Poisson-based 


temporal resampling. Analyses are restricted to April-November period. 
5. Discussion and Conclusions 


In this paper we introduce RainyDay, a Python-based platform that couples rainfall remote 
sensing data with a technique known as Stochastic Storm Transposition (SST) that effectively 
“lengthens” the extreme rainfall record through temporal resampling and spatial transposition of 
observed rainstorms. It produces probabilistic extreme rainfall scenarios that include realistic 
estimates of rainfall duration, intensity, and space-time structure that can be used for 


probabilistic flood and landslide hazard and risk assessment at a wide range of scales. 


The SST technique, as implemented in RainyDay, has two important features that distinguish it 
from IDF and design storm methods for describing the relationships between the intensity, 
duration, and structure of extreme rainfall. First, it leverages the detailed picture of rainfall 
space-time structure offered by ground-based radar or satellite-based sensors. This structure can 
play an important role in landslides and floods because the variability in the concentration and 
intermittency of extreme rainfall in space and time can lead to a complex and diverse spectrum 
of hazard response. This structure is difficult to measure using rain gages due to the high gage 
densities and sampling rates required, and so rain gage-based methods for analysis of rainfall- 
driven hazards, such as IDF relations and design storm methods, typically neglect this higher- 
order variability. The reader is directed to Wright et al. (2014b) for a deeper examination of this 


feature in the context of urban flood hazards. 


The second important feature of RainyDay is that, because of the near-global coverage of 
satellite rainfall datasets, it is possible to generate realistic representations of extreme rainfall in 
remote or poorly-instrumented regions where rain gage or stream gage records are lacking. Such 
regions are common even in wealthy nations and are ubiquitous in developing countries, many of 
which are characterized by rapidly-growing exposure to rainfall-driven hazards due to 


urbanization and climate change. The authors are not aware of other approaches that offer the 
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ability to generate realistic rainfall inputs for probabilistic hazard modeling nearly anywhere on 


the globe with minimal computational effort. 


Despite the advantages that SST and RainyDay offer over some other methods for assessing 
rainfall-driven hazards (e.g. design storms, discharge frequency analysis), a number of 
limitations and unanswered questions remain. Perhaps the biggest limitation to coupling SST 
with rainfall remote sensing, and to remote sensing applications more broadly, is the uncertain 
accuracy of the input rainfall data. Significant efforts have been made to better understand and 
minimize the errors in remote sensing estimates of rainfall, both from satellites (e.g. Petty and 
Krajewksi, 1996; Tian and Peters-Lidard, 2007; Tian et al., 2009) and from ground-based radar 
(e.g. Villarini and Krajewski, 2010). Such studies demonstrate that remote sensing estimates can 
vary significantly from reference observations in terms of rainfall intensity bias and 
differentiation between rainy and non-rainy areas, with important implications for hazard 
applications. In the case of satellite-based rainfall estimates, heterogeneities in the underlying 
land or water surfaces can be difficult to distinguish from variations in cloud and rainfall 
properties (e.g. Ferraro et al., 2013), while both ground-based radar and space-based sensors tend 
to suffer in mountainous areas due to dramatic variations in rainfall physical properties over 
short time and length scales. Furthermore, the spatial and temporal resolution of remote sensing 
estimates, particularly from satellites, can be too coarse for modeling at very small scales, 
especially in urban areas and fast-responding mountain or desert catchments where surface 
runoff generation from intense, short-duration rainfall on sub-hourly, sub-kilometer scales can be 
a key driver of hazards. The uncertainties associated with rainfall remote sensing data pose 
serious challenges for flood or landslide forecasting and monitoring, which require accurate 
rainfall estimates in real-time. These issues may be somewhat less critical in the SST framework 
or in long-term hazard assessment more generally, since the rainfall estimates need only have 
fidelity in the statistical sense. SST will be somewhat robust to random errors in rainfall data, as 
the underestimation of rainfall intensity from some storms in the storm catalog can be 
compensated by overestimation of rainfall intensity from others. In contrast, SST is not robust to 
systematic rainfall biases, as demonstrated in several examples in this paper. IMERG, NASA’s 


newest satellite multi-sensor dataset, will feature improved accuracy and relatively high 
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resolution (0.1°, 30-minute), addressing some of these issues once the full retrospective dataset 


from 1998-present becomes available. 


In the case of flood hazard modeling using SST, a practical upper limit on the size of the area of 
interest A can arise. As mentioned in Sections 2.1 and 3.3, the sizes of A and A’ can be limited 
due to the challenges posed by transposition in the presence of complex terrain features. 
Furthermore, as A becomes larger, the rainfall duration ¢ needed to properly model hazard 
response becomes longer. While RainyDay does not restrict the choice of ¢, practical limitations 
exist. In large watersheds, floods are usually the result of specific space-time arrangements of 
multiple distinct storm systems over the span of perhaps a week and up to months, often linked 
to persistent large-scale atmospheric disturbances. One could specify a long ¢ (a month, for 
example) in RainyDay to “capture” all of these storm systems within a single storm catalog 
entry. Such long ¢, however, means there could only be relatively few entries in the storm 
catalog, given the limited record length of the input dataset. Such an approach would be 
constrained by the few space-time configurations of these storm systems that were actually 
observed, while other non-observed configurations are hypothetically possible. A tradeoff thus 
emerges as A (and thus #) increase relative to the area of the transposition domain A’. If A is a 
large fraction of A’, then there is little opportunity to leverage the “space-for-time” substitution 
that is at the core of the SST approach. If the user instead decides to increase the size of A’, he or 
she must ensure that this transposition is performed in a realistic manner. This effectively 
precludes modeling of regions that approach continental scales. The maximum scale at which 
SST can be feasibly used is an open question with no simple answer. It should be noted that IDF 
and design storm methods face similar and perhaps even more acute limitations in terms of an 
upper area limit, though for different reasons (e.g. conceptual and practical shortcomings of 


point-based temporal rainfall distributions and area reduction factors). 


As mentioned in Section 4.3, a common critique of the methodology presented in this study is 
that the relatively short remote sensing records may not contain enough truly extreme rainfall 
events. Sensitivity to record length is not unique to SST; frequency estimates of rare hazards will 
be driven by the largest several events in the historical record, regardless of the chosen analysis 


technique. The results in Section 4.3 demonstrate that this concern may be somewhat 
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exaggerated in the case of SST since very extreme rainfall events that are considered rare from a 
local viewpoint can occur much more frequently when viewed regionally. Like more commonly- 
used regionalization techniques, SST helps to leverage this fact to improve hazard analysis. As 
the rainfall remote sensing record grows, the robustness of estimates produced by SST and 
RainyDay should increase as additional extreme storms are observed (and as their accuracy 
improves due to technological advances). Estimates of rainfall intensity will improve more per 
unit of time using SST than using point-based techniques due to SST’s regional nature, while 
new patterns of rainfall space-time structure will add to the realism of SST-based flood and 
landslide hazard estimates since a broader spectrum of hazard outcomes will be possible. 
RainyDay makes such updating simple, while IDF databases and design storm methods are 
generally updated through slow and costly procedures (Y. Zhang, personal communication, May 


14, 2015). 


As highlighted in Section 4.2, SST and RainyDay have important features in the context of 
nonstationary hazards. Extreme rainfall scenarios from RainyDay are generally based on more 
recent observations than existing rain gage or stream gage-based frequency analyses such as 
Atlas 14 IDF relations, which contain older records that may not be representative of the current 
state of the climate. In this respect, hazard analysis based on RainyDay can be understood as a 
relatively current “snapshot” based on recent climate. The performance of RainyDay is very 
dependent on major storms having occurred one or more times within the transposition domain, 
however, meaning that spatial transposition is not a perfect remedy for short data records. 
Furthermore, if the rainfall remote sensing record deviates significantly from the true long-term 
properties of extreme rainfall over the region of interest due to random chance, decadal-scale 
climate variability, or systematic measurement bias, then caution must be taken when using 
RainyDay. It can be challenging in practice to diagnose such nonstationarities and biases due to a 
lack of long-term independent observational data, particularly in remote or underdeveloped 
regions. Meanwhile, as discussed in Wright et al. (2014b), combining SST (or other rainfall- 
based approaches, e.g. Cunha et al., 2011) with a distributed hazard model allows the analyst to 


incorporate changes in land use and land cover into hazard estimates. 
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